Base sequencing apparatus

ABSTRACT

An on-line system base sequencing apparatus wherein calibration coefficients for time bases of respective electrophoresis lanes are evaluated from differences between positions of signals already outputted in a range causing no sequence inversion and positions of substantially regular intervals for originally outputting signals, and time bases as to the respective electrophoresis lanes are calibrated with the calibration coefficients, thereby obtaining correct base sequence. Thus, the bases can be correctly sequenced even if electrophoresis speed differences are caused between the electrophoresis lanes.

This is a continuation-in-part of application Ser. No. 07/737,416 filedJul. 29, 1991.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to an on-line system base sequencingapparatus for introducing nucleic acid fragment samples, which arepretreated by the Sanger method using fluorescent primers (primersobtained by chemically bonding fluorescent materials as markers), into asample introducing part of a slab type gel of a gel electrophoresisapparatus in units of end bases. The samples are simultaneouslyelectrophoresed and the fluorescence is detected during theelectrophoresis by excitation using an optical detection system thatscans in a direction perpendicular to the electrophoresis direction. Thebase sequencing is then performed with a data processing unit programmedto carry out a number of functions to prevent the effects of smiling.

2. Description of the Background Art

In an on-line system base sequencing apparatus using a slab type gel,nucleic acid fragments which are previously treated by the Sanger methodare electrophoresed in different electrophoresis lanes in response tothe types A (adenine), G (guanine), T (thymine) and C (cytosine) of theend bases thereof.

In general, a gel electrophoresis apparatus creates an undesirableso-called "smiling" effect, which is a phenomenon wherein theelectrophoresis speeds vary with electrophoresis lanes. If the signalsare successively read from the electrophoresis lanes for the end basesA, G, T and C and the samples are directly base-sequenced with outaccounting for the "smiling" effect, the sequence can be inverted andresult in misreading thereof.

It is believed that "smiling" is mainly caused by varying temperaturedistributions in the electrophoresis lanes caused by Joule heating,which is generated as the result of electrophoresis. In order to prevent"smiling", there has been proposed a method wherein a metal plate isplaced in close contact with the electrophoresis plate thereby eveningthe temperature distribution. Another methods of solving varyingtemperature distributions in the electrophoresis lanes involves storingthe electrophoresis plate in a closed container and supplying aircontrolled at a constant temperature thereby homogenizing thetemperature, as disclosed in Japanese Patent Laying-Open Gazette No.2-143145 (1990).

While the mobility difference between 500 bases and 501 bases is 0.2%(=1/500), for example, it is necessary to control any temperatureirregularity to be not more than 0.1° C. in order to suppress themobility difference caused by "smiling" to be not more than 0.2% bytemperature control. In practice, this type of temperature control isvery difficult to maintain.

Another problem associated with conducting a base electrophoresisinvolves electrophoresis gel containing electrolytic ammonium persulfateas a catalyst, which tends to migrate toward the side of an externalelectrode buffer following electrophoresis. If the concentration of thiselectrolyte is varied with position, differences in ionic strength canoccur between the different electrophoresis lanes resulting in"smiling." This type of "smiling" resulting from non-heterogeneousconcentration of the electrolyte cannot be prevent by temperaturecontrol.

In addition to the aforementioned types of "smiling", misreading of thebase sequence is also caused by nonheterogeneous sample introductionslots of the electrophoresis lanes. In general, electrophoresisdistances (i.e. distances between sample introduction slots and adetection part) of an on-line system fluorescent DNA sequencingapparatus are about 200 to 500 mm. At the sample introduction slots,sample positions can easily be displaced by 1 to 2 mm from each other bydifferences in horizontal position of the gel formation, penetration ofurea within the slots, and the like. A difference of 1 mm at the sampleintroduction slots corresponds to a difference of 0.2% (1/500) assumingthe electrophoresis distance to be 500 mm. This is equal to the mobilitydifference between bases 500 and 501. It is impossible to prevent suchmobility difference by temperature control.

SUMMARY OF THE INVENTION

An object of the present invention is to provide an improved basesequencing apparatus.

Another object of the present invention is to provide a base sequencingapparatus, which can correctly sequence bases even if mobilitydifferences between lanes result from "smiling" or the like.

According to the present invention, calibration coefficients for timebases of respective electrophoresis lanes are evaluated from differencesbetween the positions of signals already outputted in a range that nosequence inversion is caused, and positions of substantially regularintervals for original outputting signals, and calibrating subsequenttime bases for the respective electrophoresis lanes with the resultingcalibration coefficients, thereby attaining the correct base sequence.

As shown in FIG. 1, a data processing unit for the inventive basesequencing apparatus comprises: signal storage means 40 for storingsignals from respective electrophoresis lanes with respect to time;maximum signal time detection means 42 for detecting time providingmaximum values of signals as to the respective electrophoresis lanes;maximum signal time storage means 44 for storing the maximum signaltimes; appearance time estimation means 46 for calculating appearancetimes of the maximum signals from the electrophoresis lanes other than areference lane selected from four (4) electrophoresis lanes appearingbetween two maximum signals of the reference lane on the assumption thatthere is no difference of mobility between the electrophoresis lanes;calibration coefficient calculation means 48 for calculating calibrationcoefficients from ratios of the maximum signal times of the three (3)electrophoresis lanes calculated in the appearance time estimation means46 to actual maximum signal times; time base calibration means 50 forcalibrating the time bases of the three electrophoresis lanes with thecalibration coefficients; and base sequencing means 52 for performingbase sequencing from the maximum signal times of the reference lane andthe three (3) electrophoresis lanes based on the calibrated time bases.

The appearance time estimation means 46 performs calculations on theassumption that the maximum signals of the three (3) electrophoresislanes appear between two maximum signals of the reference lane atregular intervals, for example.

In order to update the calibration coefficients, a calibrationcoefficient is calculated every time a maximum signal appears on thereference lane between this signal and a maximum signal appearing on thereference lane immediately ahead thereof. Then, the time bases of theremaining three (3) electrophoresis lanes, which are effective until anext maximum signal appears on the reference lane, can be calculatedwith the updated calibration coefficient.

In an on-line system base sequencing apparatus, signals appears in orderof length (e.g. from short nucleic acid fragments to long nucleic acidfragments) with the lapse of time. While the signals themselves arebroad and sequence misreading by sequence inversion is easily caused by"smiling" in subsequently appearing long nucleic acid fragments, thesignals are sharp in previously appearing short nucleic acid fragmentswith no sequence inversion occurring even if there exists "smiling"resulting in no misreading. When short nucleic acid fragments havealready appeared, however, displacement from positions of originalappearance (i.e. positions at regular time intervals) occurs if"smiling" takes place, however, with no sequence inversion.

As shown in FIG. 8, a data processing unit for a base sequencingapparatus according to the present invention comprises signal storagemeans 40 for storing signals from respective electrophoresis lanes withrespect to time; maximum signal time detection means 42 for detectingtime values providing maximum values of signals as to the respectiveelectrophoresis lanes; maximum signal time storage means 44 for storingthe maximum signal times; appearance time estimation means 46 forcalculating appearance times of maximum signals of the electrophoresislanes other than a standard lane, being selected from four (4)electrophoresis lanes, appearing between two maximum signals of thestandard lane with the time base of the standard lane on the assumptionthat there is no difference of mobility between the electrophoresislanes; calibration coefficient calculation means 48 for calculatingcalibration coefficients from ratios of the maximum signal times of thethree (3) electrophoresis lanes calculated in the appearance timeestimation means 46 to actual maximum signal times, means 101 forconverting time bases of signals in time domains other than those usedfor calculating the calibration coefficients through the calculatedcalibration coefficients; means 102 for deciding validity of thecalibration coefficients depending on whether or not the as-convertedsignals appear at uniform time intervals; total time base calibrationmeans 50 for calibrating the total time bases of the three (3)electrophoresis lanes through calibration coefficients being decided asbeing valid; and base sequencing means 52 for performing base sequencingfrom the maximum signal times of the four electrophoresis lanes based onthe calibrated time bases. The appearance time estimation means 46performs calculations on the assumption that the maximum signals of thethree (3) electrophoresis lanes appear between two maximum signals ofthe standard lane at regular intervals, for example.

Means 103 is adapted to change calculation domains for re-calculatingcalibration coefficients, when the means 102 decides that thecalibration coefficients are invalid.

Means 104 is adapted to change the standard lane when all calibrationcoefficients of the three (3) lanes are not decided to be valid withrespect to the set standard lane.

In an on-line base sequencing apparatus, signals appear in order fromthe shortest nucleic acid fragment to the longest fragment, as shown inFIG. 9. Referring to FIG. 9, curves for lanes 1, 2, 3 and 4, which mustoriginally be absolutely identical, are separated from each other due totemperature irregularity. If the temperature irregularity is in astationary state, however, the curves are in such proportionalrelationship to each other that the ratio a1 to a2 is constant at everypoint. As understood from FIG. 9, signals themselves are broad and basesequences are easy to misread (inverted) due to "smiling" as to longnucleic acid fragments appearing later (see sampling points 4, 5 and 6in FIG. 9), while signals are sharp and no inversion nor misreading ofbase sequences takes place even if "smiling" is caused, as to shortnucleic acid fragments appearing in advance (refer to sampling points 1,2 and 3 in FIG. 9). When the short nucleic acid fragments have alreadyappeared, however, there is recognized displacement from the originalappearance positions (positions substantially at regular time intervals)if "smiling" is caused, although the base sequences are not inverted.

According to the present invention, calibration coefficients for timebases of respective electrophoresis lanes are obtained from displacementbetween positions of already appearing signals in a range causing noinversion of base sequences and original signal appearance positionswhich are substantially at regular intervals so that time bases for thesubsequent respective electrophoresis lanes are calibrated through theas-obtained calibration coefficients, thereby attaining correct basesequencing.

Although the signals are correctly obtained in such an apparatus, peaksmay simultaneously appear on a plurality of lanes due to a chemicalcause, or vertical positions of the peaks may be so non-uniform that thesignals are incorrectly recognized to cause erroneous evaluations. Themeans 101, 102 and 103 shown in FIG. 8 are adapted to confirm validityof the temporarily calculated calibration coefficients by applying thesame to other time domains. Thus, the calibration coefficients areimproved in accuracy, thereby improving accuracy of the base sequencing.

According to the present invention, one of the electrophoresis lanes forA, G, T and C is selected as a reference lane so that the time bases ofthe remaining three (3) electrophoresis lanes are calibrated withreference to the distance between two peaks of the reference lane. Thus,it is possible to correctly sequence bases even if there are differencesbetween electrophoresis speeds of the electrophoresis lanes due to"smiling" or the like.

Further, the inventive method can cope with intermediate changes ofelectrophoresis states. Also, the present invention can also cope with"smiling" resulting from a cause other than varying temperaturedistributions.

The present invention can further cope with a system of taking data andsequencing bases successively from data taken in a multitask manner.

The present invention can be combined with a method of homogenizingtemperatures by ambient air control. The accuracy is further improved inthis case. Further, the foregoing invention can be combined with amethod of homogenizing temperatures by ambient air control. The accuracyis further improved in this case.

The data processing unit of the base sequencing apparatus according tothe present invention can be programmed with conventional computersoftware to carry out the above described functions in various manners.One skilled in the art of programming equipment of this nature candesign the requisite program without undo experimentation, or withoutspecial skill or knowledge based on the information contained in thisdisclosure.

The foregoing and other objects, features, aspects and advantages of thepresent invention will become more apparent from the following detaileddescription of the present invention when taken in conjunction with theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing functions of a signal processingmicrocomputer in an embodiment of the present invention;

FIG. 2 is a perspective view schematically showing the embodiment;

FIG. 3 illustrates signals of respective electrophoresis lanes measuredin the embodiment;

FIG. 4 is a flow chart showing procedure for calculating calibrationcoefficients;

FIG. 5 illustrates exemplary maximum value data and definitive basesequence;

FIG. 6 is a flow chart showing the procedure of pretreatment fordiscontinuous electrophoresis conditions;

FIG. 7 illustrates exemplary detection signals in relation tofluorescent primers;

FIG. 8 is a block diagram showing functions of a signal processingmicrocomputer in an embodiment of the additional invention;

FIG. 9 illustrates relations between peak appearance times and DNAfragment base lengths;

FIG. 10 illustrates signals of respective electrophoresis lanes measuredin the embodiment; and

FIG. 11 is a flow chart showing the operation of the embodiment.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

An embodiment of the base sequencing apparatus according to the presentinvention is shown in FIG. 2.

A slab-type electrophoresis gel 2 is prepared from a polyacrylic amidegel. Both ends of the electrophoresis gel 2 are dipped in electrodelayers 4 and 6, which contain electrolytic solutions. An electrophoresispower source 8 applies an electrophoresis voltage across the electrodelayer 4 and 6.

Sample introduction slots 10 are provided in one end of theelectrophoresis gel 2 in order to inject samples. Samples of respectiveend bases are introduced into prescribed positions of the respectiveslots 10. These samples are prepared from four types of DNA fragmentswhich are labelled by FITC, being a fluorescent material, by a wellknown method and so treated that respective bases A, G, T and C come toends by the Sanger method. The FITC is excited with an argon laser beamof 488 nm in wavelength, and generates fluorescence of 520 nm inwavelength.

When the power source 8 applies the electrophoresis voltage, the samplesare electrophoresed in the electrophoresis gel 2 with time in anelectrophoresis direction 14 as electrophoresis bands 16 separate andreach the measuring portion.

The measuring portion is provided with an excitation system for applyingexcitation light from an argon laser 18, which emits a laser beam of 488nm in wavelength, by a condenser lens 20 and a mirror 21, and adetection system for collecting fluorescent light generated fromfluorescent materials forming the electrophoresis bands 16 which arepresent in positions irradiated with the exciting laser beam by anobjective lens 22 and detecting the fluorescent light by aphotomultiplier 28 through an interference filter 24 of 520 nm, acondenser lens 26 and an optical fiber tube 27. The excitation anddetection optical systems including the condenser lens 20, the mirror21, the objective lens 22, the interference filter 24, the condenserlens 26 and the optical fiber tube 27 are provided on a scanning stage30, which mechanically moves to scan on a measuring line in a direction(scanning direction 29) where the position irradiated with theexcitation light beam intersects with the electrophoresis direction 14every constant period.

Detection signals (fluorescent signals) from the photomultiplier 28 areincorporated in a signal processing microcomputer 31, which is a dataprocessing unit, through an amplifier and an A-D converter 32. Themicrocomputer 31 also incorporates signals corresponding to positionsirradiated with the excitation beam on the measuring position of theelectrophoresis gel 2 as scan data. Thus, the overall fluorescentsignals obtained by scanning of the excitation and detection opticalsystems in the scanning direction are incorporated in the microcomputer31 with position information.

The operation of this embodiment is now described with reference toFIGS. 3, 4 and 5.

FIG. 3 shows exemplary signals obtained in the base sequencing apparatusshown in FIG. 2. Symbols A, G, T and C correspond to the respectiveelectrophoresis lanes, while the symbol Nt represents numbers ofscannings made by the optical systems in the scanning direction.

FIG. 3 shows portions having relatively short base lengths, and noinversion of signal appearance order, i.e., sequence misreading iscaused even if smiling takes place, since the difference betweenelectrophoresis speeds per base length is relatively large. Referring toFIG. 3 in more detail, however, eight peaks of G, T and C are presentbetween 11-th and 20-th peaks of A. When lines are drawn on theassumption that the eight peaks appear between the two peaks of A atregular intervals, it is understood that the peaks of G are delayed fromthe lines and the peaks of T appear slightly ahead of the lines althoughthe peaks of C are present substantially on the lines.

When samples are electrophoresed under a constant voltage, the peakswill not appear strictly at regular intervals in the structure shown inFIG. 2. However, when time bases are calibrated between two peaks of acertain reference electrophoresis lane, i.e., the lane of A in this caseas shown in FIG. 3, approximation of such regular intervals issufficient since the interval between the two peaks of the referencelane is about 20 to 30 bases at the most.

As to the signals shown in FIG. 3, it is obvious that the appearanceorder is inverted in due course of time to cause errors in basesequencing since the peaks of G are delayed and those of T are ahead ascompared with those of A and C.

Procedures for calibrating the time bases is now described withreference to a flow chart shown in FIG. 4.

In an initial state, the detected signals of A, G, T and C shown in FIG.3 are in sequence RA(16000), RG(16000), RT(16000) and RC(16000).Dimensions RA(16000), RG(16000), RT(16000) and RC(16000) define an arrayof digitalized values of detected phosphorescence signals as to theelectrophoresis lanes of A, G, T and C, respectively. The sequenceRA(16000) starts from RA(0). This also applies to other sequence RG, RTand RG. However, the sequence RG, RT and RG are rewritten in accordancewith progress of the program.

Initial values of sequence A(1000), G(1000), T(1000) and C(1000) forstoring maximum value date are zero, and peaks are so numbered that anM-th peak appearing on G is G(m). Initial values of calibrationcoefficients Kf, Kt and Kc are set at 1. Signals for which time basesare calibrated as to the electrophoresis lanes of G, T and C aretemporarily stored in sequence RG1(16000), RT1(16000) and RC1(16000),whose initial values are matched with measured values. SequenceSEQ$(1000) is adapted to stored definitive base sequence, whose initialvalues are zero.

A sequence number NSEQ is initially set at 1, and a time number Nt isset at 2 (steps S1 and $2). Symbol Nt represent scan numbers in thisembodiment.

The electrophoresis lane of A is assumed to be the reference lane, and adetermination is made as to whether or not a signal A is a maximumsignal (step $3). If the signal A is a maximum signal, it is assumedthat SEQ$(NSEQ)=A and A(NSEQ)=Nt as base sequence (step 4). Adetermination is made as to whether or not there is a maximum signal ofA ahead of this maximum, and if this maximum is the first one, 1 isadded to the sequence number NSEQ as well as to the time number Nt(steps $5, $6, $7, $8 and $9), and the process returns to the step $3 torepeat the processing.

If the signal A is not a maximum signal, a determination is made as towhether or not there is a maximum signal in G, T or C in steps S10 toS12, and if there is no maximum signal, 1 is added to the time number Nt(steps) $8 and $9), and the same operation is repeated again. If amaximum signal is found in G, T or C, the bases are sequenced (stepsS13, S14 and S15) and 1 is added to the sequence number NSEQ as well asto the time number Nt (steps S16, $8 and $9), and the process returns tothe step $3 to repeat the processing.

Referring again to FIG. 3, the 11-th peak A appears at Nt=2020-thscanning and the 20-th peak A appears at Nt=2200-th scanning, and peaksof other electrophoresis lanes are also detected to evaluate maximumvalue data as shown in FIG. 5. The base sequence SEQ$(1000) is definedas AGTTC ... ... as shown in FIG. 5.

Steps S17 to $22 shown in FIG. 4 are adapted to evaluate calibrationcoefficients. In the example shown in FIG. 3, the process advances tothe step S17 upon appearance of the 20-th peak A to calculate scannumbers of enclosed peaks in FIG. 5, i.e., scan numbers of the 19-thpeak C, the 18-th peak G and the 17-th peak T in proportionaldistribution as values attained on the assumption that the peaks appearat regular intervals (this corresponds to vertical lines in FIG. 3) andto take ratios thereof to measured scan numbers, thereby calculatingcalibration coefficients of the respective electrophoresis lanes of G, Tand C.

In relation to the signals shown in FIG. 3, the peak time (scan number)of the actual signal is 2170 as to the electrophoresis lane G, and thepeak time calculated in proportional distribution on the assumption thatthe peaks appear at regular intervals is 2160, whereby the calibrationcoefficient Kg for G is as follows:

    Kg=2160/2170=0.99539

Also as to the electrophoresis lane T, the calibration coefficient Kt issimilarly calculated as follows:

    Kt=2140/2135=1.0023419

As to the lane C, the calibration coefficient Kc is equal to 1.

The respective calibration coefficients evaluated in the aforementionedmanner are multiplied by the times of the respective sequence of RG, RTand RC following 2200, to rewrite date of RG, RT and RC and introducethe same to RG1, RT1 and RC1.

This procedure is repeated every appearance of the peak A.

The flow chart shown in FIG. 4 is on the premise that theelectrophoresis conditions are constant with constant voltage, constantcurrent and constant power, for example, upon starting ofelectrophoresis at least from appearance of signals, while correction isrequired when the electrophoresis conditions are changes duringelectrophoresis, for example. FIG. 6 shows an exemplary datapretreatment method in relation to discontinuous change ofelectrophoresis conditions. In the example shown in FIG. 6, a data trainis finally converted to electrophoresis data under a constant voltage of1 Kv.

First, data of time voltage products are produced as to the respectivetimes form measured electrophoresis voltage data RV(J) (unit: KV) asfollows: ##EQU1## where symbol SV(K) represents a monotone increasingfunction of J.

In order to evaluate respective date RAi(M), RGI(M), RT1(M) and RC1(M)(M=1 ...... 16000) of 1 KV constant voltage conversion values, the valueof J providing the first SV(J) which is greater than M, for example, maybe evaluated to apply values of original signals with respect to J. Forexample, RAI(M) =RA(J), RGi(M) =RG(J) ... ... .

After the pretreatment is performed according to the flow chart shown inFIG. 6, the process advances to the step S1 shown in FIG. 4.

The first peak of the reference lane may be a fluorescent primer. Inthis case, peaks simultaneously appear at A, G, T and C of the firstsequence numbers as shown in FIG. 7, for example. Also in this case, theflow chart shown in FIG. 4 requires no change but effectuates the samefunction.

In the flow chart shown in FIG. 4, the calibration coefficients arecalculated every time a peak appears in the electrophoresis lane A,which is the reference lane, to repeat calibration of the time bases ofthe remaining three lanes. Alternatively, calibration coefficients oncecalculated in portions having relatively short base lengths, forexample, may also be employed for portions having long base lengths.

The embodiment of the base sequencing apparatus is identical to thatdescribed with reference to FIG. 2.

The operation of the embodiment shown in FIGS. 10 and 11 is nowdescribed.

FIG. 10 shows exemplary signals obtained in the base sequencingapparatus. Symbols A, G, T and C correspond to respectiveelectrophoresis lanes, while times shown on the axis abscissas are inone-to-one correspondence to numbers of scanning operations made byoptical systems perpendicularly to the electrophoresis direction.

FIG. 10 shows portions having relatively shore base lengths, and noinversion of signal appearance order (i.e. no sequence misreading iscaused even if "smiling" takes place, since differences betweenelectrophoresis speeds per base length are large). Referring to FIG. 10in more detail, three peaks of G, C and G are present in domain 1. Whenlines are drawn on the assumption that three peaks appear between twopeaks of A at regular intervals, it is understood that the peaks of Gare delayed from the lines while the peak of C appears slightly ahead ofthe lines. When lines are similarly drawn in a domain 2, it isunderstood that a peak of T is delayed as compared with peaks of A.

Procedure form calibration of time bases to base sequencing is nowdescribed with standard to a flow chart shown in FIG. 11.

CALCULATION OF CALIBRATION COEFFICIENTS

A "smiling" calibration coefficient (i.e. mobility ratio with respect toa standard lane) is calculated for the short DNA fragments where peaksare clear and no emergence order exchange occurs (steps S31, S32, S33,S34). The standard lane can be chosen arbitrarily in principle, then ifthe program cannot determine the calibration coefficients, the standardlane will be exchanged (see Step S42).

In practice, sometimes G or C lane exhibits compressions, however, thealgorithm will reject such region automatically. In the followingexplanation, assume lane A is determined as the standard. In FIG. 10,the dots in time-axes indicate the expected emergence time calculatedunder an assumption that the pitch of peak emergences is constant (=unitpitch) within a restricted period between two consecutive peaks in laneA (A--A domain). For example, in domain 1, unit pitch=((ta2-tal)/4,where the devisor 4 corresponds to the number of peaks between the laneA peaks. This assumption means, within the A--A domain (order of 10-20bases), each plot in FIG. 9 approximates to straight. The discrepanciesbetween the expected and observed peak emergence times are indicated asΔtg, Δtc, and Δtt. In the first cycle, the domain for calculation issupposed to set to domain 1, then the calibration coefficient for lane Gand C can be calculated at Δtg/tg and Δtc/tc, respectively. Thecoefficient for T lane will be calculated in the second cycle where thedomain for calculation shifts to domain 2.

EXAMINATION OF COEFFICIENTS (STEPS S35, S36 AND S40)

The calculated coefficients are examined in the consecutive or latersignal domain by checking the variance of pitches between the peakemergency times after applying the time axis transformation according tothe calculated coefficients. The detail procedure is:

(i) the time-axis of examined lanes (e.g. G and C lanes for the firstcycle) are divided by the calculated coefficients.

(ii) in order to calculate the peak number in an A--A period containingthe consecutive peaks in the examined lane, the A--A period length isdivided by the unit pitch in Step S34 and then the quotient is roundedto an integer, assuming unit pitch varies very slightly throughout.##EQU2## (iii) in order to estimate the unit pitch around the examinedpeak (tg3' or tc2'), the A--A period length is divided by the peaknumber form (ii) .

[unit pitch in dom. 2 =(ta3-ta2)/peak number in dom. 2 ]

This is a description of the algorithm generally. This will equal to(ta4-ta3)/4 in the example of FIG. 10.

[unit pitch in dom. 3 =(ta4-ta3 )/peak number in dom. 3 ]

(iv) pitch between the examined peak and the previous A peak is dividedby the unit pitch of (iii). The decimal fraction after integersubtraction of the quotient represents phase discrepancy of peakemergency, which can judge the consistency. ##EQU3## (v) if the phasediscrepancy of (iv) is near 0 or 1 (in this experiment, the errorallowance was set to one unit of time-axis resolution), the calculated"smiling" coefficient is regarded as correct. If not, regarded as wrong,the A--A domain for calculation is changed to the consecutive one andthe program goes back to step 2 for re-calculation (step S36, S40).

STANDARD LANE EXCHANGE (STEP S42)

If the repeated calculation/examination procedures leads the calculatingdomain out of the preset short nucleotide region (in the experiments,the region is set to: 1500-2012 scan, i.e., 2 hours and 5 minutes to 2hours and 47 minutes sample electrophoresis, where emergency orderexchange never occurs), the standard lane will be changed to retry (inthe experiments, programmed as A→T→G→C).

TIME-AXIS TRANSFORMATION (STEP S38)

The entire time-axes are transformed according to the calibrationcoefficients calculated and examined by the above algorithm, thensequence are obtained from the transformed data.

Thus, the calibration coefficients are so examined that correct ones areemployed for improving accuracy in base sequencing.

The data processing unit of the base sequencing apparatus according tothe present invention can be programmed with conventional computersoftware to carry out the above described functions in various manners.One skilled in the art of programming equipment of this nature candesign the requisite program without undo experimentation, or withoutspecial skill or knowledge based on the information contained in thisdisclosure.

Although the present invention has been described and illustrated indetail, it is clearly understood that the same is by way of illustrationand example only and is not to be taken by way of limitation, the spiritand scope of the present invention being limited only by the terms ofthe append claims.

I claim:
 1. A base sequencing apparatus for dividing labelled nucleic acid fragment samples into four electrophoresis lanes in response to types of end bases and simultaneously gel-electrophoresing said samples while scanning signals outputted form said nucleic acid fragment samples in a direction perpendicular to the electrophoresis direction for acquiring said signals in on-line real time, thereby sequencing bases in a data processing unit, whereinsaid data processing unit evaluates calibration coefficients of time bases of respective said electrophoresis lanes from differences between positions of signals being already outputted in a range causing no sequence inversion and positions of substantially regular intervals for originally outputting signals, for calibrating time bases as to respective subsequent electrophoresis lanes with said calibration coefficients.
 2. A base sequencing apparatus for dividing labelled nucleic acid fragment samples into four electrophoresis lanes in response to types of end bases and simultaneously gel-electrophoresing said samples while scanning signals outputted from said nucleic acid fragment samples in a direction perpendicular to the electrophoresis direction for acquiring said signals in on-line real time, thereby sequencing bases in a data processing unit, whereinsaid data processing unit comprises: signal storage means for storing signals form respective said electrophoresis lanes with respect to time; maximum signal detection means for detecting times providing maximum values of said signals as to respective said electrophoresis lanes; maximum signal time storage means for storing said maximum signal times; appearance time estimation means selecting one of said four electrophoresis lanes as a reference lane and assuming that there is no difference of mobility between said electrophoresis lanes for calculating appearance times of maximum signals of the remaining three electrophoresis lanes appearing between two maximum signals of said reference lane on the time base of the reference lane; calibration coefficient calculation means for calculating calibration coefficients from ratios of said maximum signal times of said three electrophoresis lanes calculated in said appearance time estimation means to actual maximum signal times; time base calibration means for calibrating said time bases of said three electrophoresis lanes with said calibration coefficients; and base sequencing means for sequencing bases from said maximum signal times of said reference lane and said three electrophoresis lanes based on calibrated said time bases.
 3. A base sequencing apparatus in accordance with claim 2, wherein said appearance time estimation means performs calculation on the assumption that said maximum signals of said three electrophoresis lanes appear between two maximum signals of said reference lane at regular time intervals.
 4. A base sequencing apparatus in accordance with claim 2, wherein said calibration coefficient calculation means calculates said calibration coefficients every time a maximum signal appears on said reference lane between said signal and a preceding maximum signal of said reference lane, and said time base calibration means calibrates said time bases of said three electrophoresis lanes being effective until a next maximum signal appears on said reference lane.
 5. A base sequencing apparatus in accordance with claim 2, wherein said calibration coefficient calculation means once calculates calibration coefficients in portions having short base lengths, and said time base calibration means calibrates said times bases with said calibration coefficients with respect to portions having long base lengths.
 6. A base sequencing apparatus in accordance with claim 1, further comprising means for converting time bases in such a case that electrophoresis conditions are changed to time bases in such manner that electrophoresis conditions are constant.
 7. A base sequencing apparatus in accordance with claim 2, further comprising means for converting time bases in such a case that electrophoresis conditions are changed to time bases in such a case that electrophoresis conditions are constant.
 8. A base sequencing apparatus for dividing labelled nucleic acid fragment samples, comprising:four electrophoresis lanes for accepting different types of end bases; excitation and detection system for scanning signals outputted from the labelled nucleic acid fragment samples for acquiring said signals; and a calibration unit for calibrating said signals in real time to eliminate the effects of smiling to more accurately read a base sequence than a base sequencing apparatus not provided with such a signal calibration unit.
 9. A base sequencing apparatus according to claim 8, wherein said signal calibration unit evaluates the calibration coefficients of time values of the bases in the respective electrophoresis lanes to determine the rate coefficients of the electrophoresis lanes from differences between positions of detected signals already outputted in a range determined to cause no sequence inversion.
 10. A base sequencing apparatus according to claim 9, wherein said signal calibration unit includes a data processing unit, said date processing unit, comprising:signal storage means for storing signals from respective electrophoresis lanes with respect to time values; maximum signal detection means for detecting time values providing maximum values of said signals as to respective electrophoresis lanes; maximum signal storage means for storing maximum signal time values; appearance time estimation means selecting one of said electrophoresis lanes as a reference lane and assuming that there is no difference of mobility between said electrophoresis lanes for calculating appearance time values of maximum signals of the remaining three electrophoresis lanes appearing between two maximum signals of said reference lane on the time basis of the reference lane; calibration coefficient calculation means for calculating calibration coefficients from ratios of said maximum signal time values of said three electrophoresis lanes calculated in said appearance time estimation means to actual maximum signal time values; time base calibration means for calibrating said time basis of said three electrophoresis lanes with said calibration coefficients; and base sequencing means for sequencing bases from said maximum signal time values of said reference lane and said three electrophoresis lanes based on calibrated said time values basis.
 11. A base sequencing apparatus according to claim 8, wherein said signal calibration unit includes a data processing unit, said date processing unit, comprising:signal storage means for storing signals from respective electrophoresis lanes with respect to time values; maximum signal detection means for detecting time values providing maximum values of said signals as to respective electrophoresis lanes; maximum signal storage means for storing maximum signal time values; appearance time estimation means selecting one of said electrophoresis lanes as a reference lane and assuming that there is no difference of mobility between said electrophoresis lanes for calculating appearance time values of maximum signals of the remaining three electrophoresis lanes appearing between two maximum signals of said reference lane on the time basis of the reference lane; calibration coefficient calculation means for calculating calibration coefficients from ratios of said maximum signal time values of said three electrophoresis lanes calculated in said appearance time estimation means to actual maximum signal time values; time base calibration means for calibrating said time basis of said three electrophoresis lanes with said calibration coefficients; and base sequencing means for sequencing bases from said maximum signal time values of said reference lane and said three electrophoresis lanes based on calibrated said time values basis.
 12. A base sequencing apparatus according to claim 8, wherein said excitation and detection system scans said signals outputted from the labelled nucleic acid fragment samples in a direction perpendicular to an electrophoresis direction.
 13. A base sequencing apparatus according to claim 12, wherein said excitation and detection system comprises a laser with means for directing a laser beam onto a electrophoresis gel selectively at four bands in a direction perpendicular to said electrophoresis direction, and a photomultiplier for detecting said signals outputted from the nucleic acid fragment.
 14. A base sequencing apparatus according to claim 8, wherein said calibration unit comprises a computer, and including an amplifier and A/D converter connected between said photomultiplier and said computer.
 15. A base sequencing apparatus according to claim 13, wherein said laser beam directing means comprises an obliquely positioned mirror relative to said laser and a condenser positioned therebetween mounted on a scanning stage which is reciprocated in a direction perpendicular to said electrophoresis direction, and an objective lens, interference filter and condenser lens is positioned in a sequence between said electrophoresis gel and said photomultiplier.
 16. A base sequencing apparatus for dividing labelled nucleic acid fragment samples, comprising:four electrophoresis lanes for dividing the labelled nucleic acid fragment samples in response to types of end bases and for simultaneously gel-electrophoresing said samples; scanning means for scanning signals outputted form said nucleic acid fragment samples in a direction perpendicular to the electrophoresis direction for acquiring said signals in on-line real time; a data processing unit for sequencing bases in on-line real time, said data processing unit comprising: signal storage means for storing signals from respective electrophoresis lanes with respect to time; maximum signal time detection means for detecting time values and providing maximum values of said signal of respective electrophoresis lanes; appearance time estimation means selecting one of said electrophoresis lanes as a standard lane and assuming that there is no difference of mobility between said electrophoresis lanes for calculating appearance times of maximum signals of the remaining electrophoresis lanes appearing between two maximum signals of said standard lane with the time base of said standard lane; calibration coefficient calculating means for calculating calibration coefficients from ratios of said maximum signal times of said three electrophoresis lanes calculated in said appearance time estimation means to actual maximum signal times; means for converting time bases of signals in time domains other than those used for calculating said calibration coefficients through calculated calibration coefficients; means for deciding validity of said calibration coefficients depending on whether or not converted said signals appear at uniform time intervals; total time base calculation means for calibrating total time bases of said electrophoresis lanes with calibration coefficients being decided as valid; and base sequencing means for sequencing bases from said maximum signal times of said four electrophoresis lanes based on calibrated time bases.
 17. A base sequencing apparatus according to claim 16, wherein said means for deciding validity of said calibration coefficients makes a decision of inadequacy when diffusion or standard deviation of difference between estimated appearance times of calculated peaks and actual appearance times of converted signals exceeds a constant value.
 18. A base sequencing apparatus according to claim 16, further comprising means for changing said standard lane when all said calibration coefficients of said three lanes are not decided to be valid as to set said standard lane. 